Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

Under maintenance

Try for free

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Hojun Lee

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Sitemap URL Discovery

Given a domain, finds sitemap.xml + sitemap_index.xml (also via robots.txt), recursively expands nested sitemaps, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.01 site fee + $0.0001/URL.

Why this exists

Before you scrape, audit, or index a site, you need to know what's there. The site's own sitemap is the authoritative list — but discovering it requires:

Checking common paths (sitemap.xml, sitemap_index.xml, wp-sitemap.xml)
Parsing robots.txt for Sitemap: directives
Recursively walking sitemap-index → child sitemaps
Parsing each one for <url> records

This actor does all of it with sane fallbacks. Returns a summary + one row per discovered URL.

What you get

Summary row

{
  "_type": "summary",
  "site_url": "https://www.apify.com",
  "sitemaps_scanned": 5,
  "sitemap_urls": [
    "https://www.apify.com/sitemap.xml",
    "https://www.apify.com/sitemap-index.xml",
    "https://www.apify.com/sitemap/actors1.xml",
    ...
  ],
  "urls_discovered": 12384
}

Per-URL row

{
  "_type": "url",
  "url": "https://www.apify.com/store/actors/...",
  "lastmod": "2026-06-08",
  "changefreq": "weekly",
  "priority": "0.7"
}

Quick start

Discover all URLs on a domain

{
  "siteUrl": "https://www.apify.com"
}

Only product / actor pages

{
  "siteUrl": "https://www.apify.com",
  "pathContains": "/store/actors/",
  "maxUrls": 5000
}

Cap scan size for huge sites

{
  "siteUrl": "https://en.wikipedia.org",
  "maxUrls": 100000,
  "maxSitemapFiles": 50
}

Pricing

Pay-Per-Event:

$0.01 — flat fee per site (covers initial discovery)
$0.0001 — per URL row returned

Run	URLs	Cost
Small SaaS site	200	$0.03
Mid-sized blog	5,000	$0.51
Mega site	100,000	$10.01

Vs Screaming Frog SEO Spider ($259/yr), Sitebulb ($175/yr) for one-off audits.

Use cases

SEO audit — Pull every URL with its lastmod; find stale content
Crawl planning — Feed URLs into Web → Markdown or your own scraper
Content monitoring — Detect new URLs by comparing snapshots over time
Competitor research — See what a competitor's catalog looks like
Sitemap sanity check — Verify sitemap-index works; catch broken nested sitemaps

Limitations

No HTML scraping fallback — If a site has no sitemap (rare for serious sites), this returns 0 URLs. For HTML-link-crawling, use a crawl-specific actor.
Doesn't honor noindex — A URL in sitemap might still be noindex in HTML; this actor returns what's in sitemap.

Web Page → Markdown Converter — Convert discovered URLs to text
HTML Metadata Extractor — Pull meta tags from each URL
PDF Text Extractor
JSON Schema Generator

Feedback

A short review helps SEO engineers find it: Leave a review on Apify Store

Sitemap & URL Discovery - Find All URLs on Any Site

santamaria-automations/sitemap-url-discovery

Discover every URL on any website by parsing sitemap.xml, robots.txt, and sitemap indexes. Extract URLs with last modified dates, change frequency, and priority. Perfect for SEO audits, content analysis, crawling preparation, and site mapping.

Ale

Website Content Pipeline for AI: Markdown, Tokens, RAG Chunks

scrapemint/website-content-crawler

Crawl any website and ship clean Markdown, plain text, and HTML for AI, LLM, and RAG pipelines. Each row carries token estimates, JSON LD metadata, link graph, and optional auto chunk splitting for vector databases. Pay per page.

Ken M

Broken Link Checker - Find 404s and Dead Links

santamaria-automations/broken-link-checker

Crawl any website and find broken links, 404 errors, redirect chains, timeouts, and SSL failures. Essential for SEO audits, QA, and content maintenance. Export data, run via API, schedule and monitor runs, or integrate with other tools.

Ale

Sitemap Generator

himalyancoder/Sitemap-generator

Sameer Pun

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

Tatsuya Mizuno

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

Alex Jordan

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Contact Details Extractor

worshipful_knife/contact-details-extractor

The cheapest contact scraper on Apify. Extract emails, phone numbers, company names, addresses & 25+ social profiles at $0.001/page - 50% less than competitors. Smart crawling auto-finds contact pages, bypasses Cloudflare protection, browser mode for JS sites, sitemap discovery.